1 University of One Place
2 University of Another Place

Correspondence: Yichun Chen <>

1 Introduction

Here is a citation (Marwick, 2017)

Gender under-representation and inequality has long existed, from under-representation in children’s picture books to academic publications (Hamilton et al. 2006; Tushingham et al. 2017). When talking about inequality in academic fields, we usually refer to the disproportionality between men and women in the field, including archaeology. Women are often underrepresented due to the lack of support women receive in academic fields (Xu 2008). When focusing on the field of archaeology, we can see that gender imbalance still occurs in archaeology and that seems to impact the kind of publications that are done in the field (Bardolph et al. 2016). As we can see in the authorship of the main authors from the Journal of Field Archaeology, 78% of them are men (Heath-Stout 2020). When looking at gender ratios among the membership in Society for California Archaeology from 1967 to 2016, Tushingham et al. (2017) report a trend of women increasingly maintaining their society membership and the gender gap is almost gone. In this study we investigate gender ratios of presenters at major archaeological conferences. We extend the work of Tushingham et al. (2017) with a study of gender ratios of presenters at meetings of the Society of American Archaeology (SAA, 2016-2019), the European Association of Archaeologists (EAA, 2018) and the Computer Applications and Quantitative Methods in Archaeology (CAA, 2017) We explore how gender ratios vary over time and between these conferences. We examine the influence of the gender of the presenter on the topics that they present on.

2 Background

3 Methods

We collected lists of the people that presented research at recent SAA (2018 with 2930 presenters, EAA (2018 with 2928 presenters) and CAA (2018 with 358 presenters) conferences. We obtained publicly available programs of these conferences and also tidied the data into a rectangular form with variables including the primary author’s first name, last name and the title of their presentation. We used R and the gender package (Mullen 2018; Mihaljevic 2019; Belvis et al. 2015) to predict the gender of all first-named presenters in our sample. We used the gender package to identify a first name as male or female, based on a probability scale computed from government records of name-gender data, specifically US Social Security Administration (SSA) baby name data (Mullen 2019). This method has the advantage of speed and ease of automation, but also some substantial limitations. We are only able to infer gender into binary male/female genders and assign probabilities for the first names into these categories. This has the unfortunate result of excluding other genders from the results, and we encourage conference organizers to collect gender data directly from participants to improve the representation of minority genders in future studies. The package often fails to classify non-English names, as the SSA data the package uses are mostly in English names. This means that people with non-English names are underrepresented in our results. To remove names where gender is ambiguous, which we define as a similar proportion of males and females having that name in the SSA data, we excluded names where the difference in proportions was less than 0.47, a figure we obtained by exploratory data analysis. It is important to note that the gender designations here are not self-identified by the presenters, but are computed probabilistically. Better quality data will come if presenters self-identify their gender, but we have no alternative because currently these data are not available. Our hope is that this work, despite its many limitations, will stimulate the collection of more reliable, justifiable, and useful data by conference organisers.

We are also interested in whether there is a variation in topic choices between these two genders in archaeological conferences. To do this we have used topic modeling, which is a method to automatically find related groups of words that resemble ‘themes’ or ‘topics’. This approach does not treat one topic, which consists of many words, as separate from another topic. That is, a word can be present in many topics but with differs weightings. When generating topic models, we have to first decide on the number of topics that the method will identify in our texts. We then create a topic model for all our texts, this will assign all words in every document into our set number of topics, and assign each word a probability based on its per-topic-per-word probability. To visualise the topic model we use the top 10 words within each topic as keywords to represent each topic. We excluded words that are common in the field of archaeology or have little semantic value, for example, ‘the’, ‘a’, ‘el’, ‘al’, ‘archaeology’, etc. We generated one topic model per conference. Each presentation at each conference is then assigned a vector of topic probabilities.

We then tried another process of topic modeling called Structural Topic Model (STM) which is very similar to the LDA we have described above. The main advantage for us is that, the STM allows us to include, alongside the normal topic modeling process, other metadata. This process “attaches” the metadata to the collection of texts that we will then perform topic modeling process. It is important to note that we have used the stm package that also includes the ability to calculate the number of topics that is best suited for STM modeling (Roberts 2019). The fact that we are allowed to do this allows us to perform covariate analysis (Roberts et al. 2013). This basically means that we can examine the relationship between the topics and some covariates/variables of interest. We can define the genders of males and females and then analyze the prevelance of these two categories in accordance with the topics that the stm package determined (Roberts 2019). The analysis is used also with probabilities given to the texts to see if the most occuring of some text seems to be “attached” to one gender over another.

To explore the relationship between gender and topics we identified the most prominent topic for each presentation…

4 Results

In the SAA data we found at total of 2930 presentations. Of these we could identify the first-named author of 2608 (89%) presentations as either men (n=1246) or women (n=1362). For the CAA meeting we have 358 presentations, and 318 (89%) first-named authors could be classified as either men (n=194) or women (n=124). For the EAA data we have 2928 presentations, and 2237 (76%) where the first author could be classified as either a man (n=1016) or woman (n=1221). This results in similar women-to-men ratios for the SAA (1.1) and EAA (1.2), but much lower for the CAA (0.6).

In the SAA presentations we identified the optimum number of topics as 35, with 17 of these showing non-random co-variance with the gender of the first-named presenter. One of the main difference that we can see between the topics of the genders are the locations that the presenters are interested in, it seems like that females are more likely to have a topic at a specific region such as mesoamerica, arizona and national parks, while males are more likely to have a topic that are in the western regions.

For EAA we have generated a total of 37 topics, with 16 of these showing non-random co-variance with the gender of the first-named presenter. Next, let’s take a look at EAA’s difference between females and males. Although both red-marked topics include topics with “mediterranean”, it is important to distinguish that females seems to be focusing on western mediterranean while males seems to be focusing on northern mediterranean. Another main difference that we have noticed is that males tends to have topics that are related to architect, while the topic regarding research methods using isotopes seems to be more popular for females.

For CAA we hve generated a total of 27, with 6 of these showing non-random co-variance with the gender of the first-named presenter.. Lastly, in CAA’s stm topic plot we can see a clearer pattern from the topics marked in red. For females from CAA, we see that topics that are associated with Geographic Information System (GIS) are more likely to be a topic for female presenters. Another female-associated topics from the CAA topic plot is topics that includes the word women, excav and inform. As for males from CAA, there doesn’t seem to have a specific word that is more associated with males besides the word enviorn.

5 Discussion

6 Conclusion

7 Acknowledgements

8 References

Marwick, B., 2017. Computational reproducibility in archaeological research: Basic principles and a case study of their implementation. Journal of Archaeological Method and Theory 24, 424–450. https://doi.org/10.1007/s10816-015-9272-9

8.0.1 Colophon

This report was generated on 2020-05-04 17:13:23 using the following computational environment and dependencies:

#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 3.6.0 (2019-04-26)
#>  os       macOS  10.15.2              
#>  system   x86_64, darwin15.6.0        
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  ctype    en_US.UTF-8                 
#>  tz       America/Los_Angeles         
#>  date     2020-05-04                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version    date       lib source                             
#>  assertthat    0.2.1      2019-03-21 [1] CRAN (R 3.6.0)                     
#>  backports     1.1.6      2020-04-05 [1] CRAN (R 3.6.2)                     
#>  bookdown      0.18       2020-03-05 [1] CRAN (R 3.6.0)                     
#>  broom         0.5.5      2020-02-29 [1] CRAN (R 3.6.0)                     
#>  callr         3.4.3      2020-03-28 [1] CRAN (R 3.6.2)                     
#>  cellranger    1.1.0.9000 2019-05-28 [1] Github (rsheets/cellranger@7ecde54)
#>  cli           2.0.2      2020-02-28 [1] CRAN (R 3.6.0)                     
#>  colorspace    1.4-1      2019-03-18 [1] CRAN (R 3.6.0)                     
#>  crayon        1.3.4.9000 2020-05-03 [1] Github (gaborcsardi/crayon@dcf6d44)
#>  data.table    1.12.8     2019-12-09 [1] CRAN (R 3.6.0)                     
#>  DBI           1.1.0      2019-12-15 [1] CRAN (R 3.6.0)                     
#>  dbplyr        1.4.2      2019-06-17 [1] CRAN (R 3.6.0)                     
#>  desc          1.2.0      2018-05-01 [1] CRAN (R 3.6.0)                     
#>  devtools      2.3.0      2020-04-10 [1] CRAN (R 3.6.0)                     
#>  digest        0.6.25     2020-02-23 [1] CRAN (R 3.6.0)                     
#>  dplyr       * 0.8.5      2020-03-07 [1] CRAN (R 3.6.0)                     
#>  ellipsis      0.3.0      2019-09-20 [1] CRAN (R 3.6.0)                     
#>  evaluate      0.14       2019-05-28 [1] CRAN (R 3.6.0)                     
#>  fansi         0.4.1      2020-01-08 [1] CRAN (R 3.6.0)                     
#>  farver        2.0.3      2020-01-16 [1] CRAN (R 3.6.0)                     
#>  forcats     * 0.5.0      2020-03-01 [1] CRAN (R 3.6.0)                     
#>  fs            1.4.1      2020-04-04 [1] CRAN (R 3.6.2)                     
#>  generics      0.0.2      2018-11-29 [1] CRAN (R 3.6.0)                     
#>  ggplot2     * 3.3.0.9000 2020-05-03 [1] Github (tidyverse/ggplot2@8c66f51) 
#>  glue          1.4.0      2020-04-03 [1] CRAN (R 3.6.2)                     
#>  gtable        0.3.0      2019-03-25 [1] CRAN (R 3.6.0)                     
#>  haven         2.2.0      2019-11-08 [1] CRAN (R 3.6.0)                     
#>  here        * 0.1        2017-05-28 [1] CRAN (R 3.6.0)                     
#>  hms           0.5.3      2020-01-08 [1] CRAN (R 3.6.0)                     
#>  htmltools     0.4.0      2019-10-04 [1] CRAN (R 3.6.0)                     
#>  httr          1.4.1      2019-08-05 [1] CRAN (R 3.6.0)                     
#>  jsonlite      1.6.1      2020-02-02 [1] CRAN (R 3.6.0)                     
#>  knitr         1.28       2020-02-06 [1] CRAN (R 3.6.0)                     
#>  labeling      0.3        2014-08-23 [1] CRAN (R 3.6.0)                     
#>  lattice       0.20-41    2020-04-02 [1] CRAN (R 3.6.2)                     
#>  lifecycle     0.2.0      2020-03-06 [1] CRAN (R 3.6.0)                     
#>  lubridate     1.7.8      2020-04-06 [1] CRAN (R 3.6.2)                     
#>  magrittr      1.5        2014-11-22 [1] CRAN (R 3.6.0)                     
#>  Matrix        1.2-18     2019-11-27 [1] CRAN (R 3.6.0)                     
#>  memoise       1.1.0      2017-04-21 [1] CRAN (R 3.6.0)                     
#>  modelr        0.1.6      2020-02-22 [1] CRAN (R 3.6.0)                     
#>  munsell       0.5.0      2018-06-12 [1] CRAN (R 3.6.0)                     
#>  nlme          3.1-147    2020-04-13 [1] CRAN (R 3.6.2)                     
#>  pillar        1.4.3      2019-12-20 [1] CRAN (R 3.6.0)                     
#>  pkgbuild      1.0.7      2020-04-25 [1] CRAN (R 3.6.2)                     
#>  pkgconfig     2.0.3      2019-09-22 [1] CRAN (R 3.6.0)                     
#>  pkgload       1.0.2      2018-10-29 [1] CRAN (R 3.6.0)                     
#>  prettyunits   1.1.1      2020-01-24 [1] CRAN (R 3.6.0)                     
#>  processx      3.4.2      2020-02-09 [1] CRAN (R 3.6.0)                     
#>  ps            1.3.2      2020-02-13 [1] CRAN (R 3.6.0)                     
#>  purrr       * 0.3.4      2020-04-17 [1] CRAN (R 3.6.0)                     
#>  R6            2.4.1      2019-11-12 [1] CRAN (R 3.6.0)                     
#>  Rcpp          1.0.4.6    2020-04-09 [1] CRAN (R 3.6.0)                     
#>  readr       * 1.3.1      2018-12-21 [1] CRAN (R 3.6.0)                     
#>  readxl        1.3.1      2019-03-13 [1] CRAN (R 3.6.0)                     
#>  remotes       2.1.1      2020-02-15 [1] CRAN (R 3.6.0)                     
#>  reprex        0.3.0      2019-05-16 [1] CRAN (R 3.6.0)                     
#>  rlang         0.4.5      2020-03-01 [1] CRAN (R 3.6.0)                     
#>  rmarkdown     2.1        2020-01-20 [1] CRAN (R 3.6.0)                     
#>  rprojroot     1.3-2      2018-01-03 [1] CRAN (R 3.6.0)                     
#>  rstudioapi    0.11       2020-02-07 [1] CRAN (R 3.6.0)                     
#>  rvest         0.3.5      2019-11-08 [1] CRAN (R 3.6.0)                     
#>  scales        1.1.0      2019-11-18 [1] CRAN (R 3.6.0)                     
#>  sessioninfo   1.1.1      2018-11-05 [1] CRAN (R 3.6.0)                     
#>  stm           1.3.5      2019-12-17 [1] CRAN (R 3.6.0)                     
#>  stringi       1.4.6      2020-02-17 [1] CRAN (R 3.6.0)                     
#>  stringr     * 1.4.0      2019-02-10 [1] CRAN (R 3.6.0)                     
#>  testthat      2.3.2      2020-03-02 [1] CRAN (R 3.6.0)                     
#>  tibble      * 3.0.1      2020-04-20 [1] CRAN (R 3.6.2)                     
#>  tidyr       * 1.0.2      2020-01-24 [1] CRAN (R 3.6.0)                     
#>  tidyselect    1.0.0.9000 2020-05-03 [1] Github (r-lib/tidyselect@a63e13d)  
#>  tidyverse   * 1.3.0      2019-11-21 [1] CRAN (R 3.6.0)                     
#>  usethis       1.6.0      2020-04-09 [1] CRAN (R 3.6.0)                     
#>  vctrs         0.2.4      2020-03-10 [1] CRAN (R 3.6.0)                     
#>  withr         2.2.0      2020-04-20 [1] CRAN (R 3.6.2)                     
#>  xfun          0.13       2020-04-13 [1] CRAN (R 3.6.2)                     
#>  xml2          1.3.2      2020-04-23 [1] CRAN (R 3.6.2)                     
#>  yaml          2.2.1      2020-02-01 [1] CRAN (R 3.6.0)                     
#> 
#> [1] /Library/Frameworks/R.framework/Versions/3.6/Resources/library

The current Git commit details are: u

#> Local:    master /Users/bmarwick/Desktop/archyconfgender
#> Remote:   master @ origin (https://github.com/yichun33/archyconfgender)
#> Head:     [2656ad9] 2020-05-04: Merge branch 'master' of https://github.com/yichun33/archyconfgender